(11) EP 1 443 417 A1

(12)

### **EUROPEAN PATENT APPLICATION**

(43) Date of publication: 04.08.2004 Bulletin 2004/32

(51) Int CI.7: **G06F 15/78**, G11C 16/30

- (21) Application number: 03425054.8
- (22) Date of filing: 31.01.2003
- (84) Designated Contracting States:

  AT BE BG CH CY CZ DE DK EE ES FI FR GB GR
  HU IE IT LI LU MC NL PT SE SI SK TR
  Designated Extension States:

  AL LT LV MK RO
- (71) Applicant: STMicroelectronics S.r.l. 20041 Agrate Brianza (Milano) (IT)
- (72) Inventors:
  - Borgatti, Michele
     41035 Finale Emilia (Modena) (IT)

- Cali, Lorenzo 20045 Besana Brianza (MI) (IT)
- Lertora, Francesco
   16040 San Colombano Certenoli (Genova) (IT)
- Pasotti, Marco 27028 San Martino Siccomario (Pavia) (IT)
- Rolandi, Pier Luigi
   15059 Monleale (Alessandria) (IT)
- (74) Representative: Botti, Mario Botti & Ferrari S.r.l., Via Locatelli, 5 20124 Milano (IT)

## (54) A reconfigurable signal processor with embedded flash memory device

(57) The present invention relates to a dynamically reconfigurable processing unit (1) including an embedded Flash memory device (3) for non-volatile storage of code, data and bit-streams, the unit (1) being integrated into a single chip together with a microprocessor (2)

core. Advantageously, the processing unit further comprises an S-RAM based embedded FPGA unit structured for FPGA reconfigurations having a specific programming interface (7) connected to a port (FP) of said Flash memory device (4) through a DMA channel (8).



#### Description

### Field of invention

[0001] The present invention relates to a dynamically reconfigurable processing unit tightly connected to a Flash EEPROM memory subsystem.

[0002] More specifically, the invention relates to reconfigurable signal processing IC with an embedded Flash memory device for non-volatile storage of code, data and bit-streams, the unit being integrated into a single chip together with a microprocessor core.

#### Prior art

[0003] As is well know by those skilled in this technical field, increasing complexity of system design and shorter time-to-market requirements are leading research towards the investigation of hybrid systems including processors enhanced by programmable logic.

[0004] In this respect, reference is made to the work by Young-Don Bae et al., "A Single-Chip Programmable Platform Base on A Multithreaded Processor and Configurable Logic Clusters", ISSCC 2002 Digest of Technical Papers, pp 336-337, Feb. 2002.

[0005] Moreover, a further reference may be considered the article by Zhang et al., having title: "A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications", ISSCC 2000 Digest of Technical Papers, pp 68-69,488, Feb. 2000.

[0006] At the same time raising costs of mask sets and shorter time-to-market available for new products, are leading to the introduction of systems with a higher degree of programmability and configurability, such as system-on-chip with configurable processors, embedded FPGA and embedded flash memory.

[0007] Moreover, the availability of an advanced embedded flash technology, based on NOR architecture, together with innovative IP's, like embedded flash macrocells with special features, is a key factor.

[0008] For a better understanding of the present invention reference is also made to the Field Programmable Gate Array (FPGA) technology combining standard processors with embedded FPGA devices.

[0009] These solutions allows to configure into the FPGA at deployment time exactly the required peripherals, exploiting temporal re-use by dynamically reconfiguring the instruction-set at run time based on the currently executed algorithm.

**[0010]** The existing models for designing FPGA/processor interaction can be grouped in two main categories:

- the FPGA is a co-processor communicating with the main processor through a system bus or a specific I/O channel;
- the FPGA is described as a function unit of the proc-

essor pipeline.

[0011] The first group includes the GARP processor, known from the article by T. Callahan, J. Hauser, and J. Wawrzynek having title: "The Garp architecture and C compiler" IEEE Computer, 33(4): 62-69, April 2000. A similar architecture is provided by the A-EPIC processor that is disclosed in the article by S. Palem and S. Talla having title: "Adaptive explicit parallel instruction computing", Proceedings of the fourth Australasian COmputer Architecture Conference (ACOAC), January 2001. [0012] In both cases the FPGA is addressed via dedicated instructions, moving data explicitly to and from the processor. Control hardware is kept to a minimum, since no interlocks are needed to avoid hazards, but a significant overhead in clock cycles is required to implement communication.

[0013] Only when the number of cycles per execution of the FPGA is relatively high, the communication overhead may be considered negligible.

[0014] In the commercial world, FPGA suppliers such as Altera Corporation offer digital architectures based on the US Patent No. 5,968,161 to T.J. Southgate, "FP-GA based configurable CPU additionally including second programmable section for implementation of custom hardware support".

[0015] Other suppliers (Xilinx, Triscend) offer chips containing a processor embedded on the same silicon IC with embedded FPGA logic. See for instance the US Patent 6,467,009 to S.P. Winegarden et al., "Configurable Processor System Unit", assigned to Triscend Corporation.

[0016] However, those chips are generally loosely coupled by a high speed dedicated bus, performing as two separate execution units rather than being merged in a single architectural entity. In this manner the FPGA does not have direct access to the processor memory subsystem, which is one of the strengths of academic approaches outlined above.

0 [0017] In the second category (FPGA as a function unit) we find architectures commercially known as: "PR-ISC"; "Chimaera" and "ConCISe".

[0018] In all these models, data are read and written directly on the processor register file minimizing overhead due to communication. In most cases, to minimize control logic and hazard handling and to fit in the processor pipeline stages, the FPGA is limited to combinatorial logic only, thus severely limiting the performance boost that can be achieved.

[0019] These solutions represent a significant step toward a low-overhead interface between the two entities. Nevertheless, due to the granularity of FPGA operations and its hardware oriented structure, their approach is still very coarse-grained, reducing the possible resource usage parallelism and again including hardware issues not familiar nor friendly to software compilation tools and algorithm developers.

[0020] Thus, a relevant drawback in this approach is

often the memory data access bottleneck that often forces long stalls on the FPGA device in order to fetch on the shared registers enough data to justify its activation.

[0021] The technical problem of the present invention is that of providing a new kind of reconfigurable processing unit tightly connected to a memory architecture having functional and structural features capable to offer significant performance and energy consumption enhancements with respect to a traditional signal processing device.

### Summary of invention

[0022] The invention overcomes the limitations of similar preceding architectures relying on an embedded device of different nature, and a new approach to processor/memory interface.

[0023] According to a first embodiment of the present invention, the reconfigurable processing unit targets image-voice processing and recognition application domains by joining a configurable and extensible processor core and an SRAM-based embedded FPGA.

[0024] More specifically, the processing unit according to the invention further includes an S-RAM based embedded FPGA unit structured for FPGA reconfigurations having a specific programming interface connected to a port FA of said Flash memory device through a DMA channel.

**[0025]** The features and advantages of the processing unit according to this invention will become apparent from the following description of a best mode for carrying out the invention given by way of non-limiting example with reference to the enclosed drawings.

### Brief description of the drawings

### [0026]

Figure 1 is a block diagram of a processing unit architecture for data processing according to the present invention;

Figure 2 is a block diagram of a Flash memory architecture embedded into the processing unit of Figure 1;

Figure 3 is a schematic view of system memory hierarchy provided by the present invention:

Figure 4 is a block diagram of a specific processor extension, for instance added DSP instructions examples;

Figure 5 is a block diagram of a further specific processor extension, for instance an optimized fixed-point calculation of the square root accounts;

Figure 6 is a table view showing the overall perform-

ance improvements for a face recognition task implemented by the processing unit of the present invention;

Figure 7 is a schematic chip micrograph.

#### Detailed description

[0027] With reference to the drawings views, generally shown at 1 is a processing unit realized according to the present invention for digital signal processing based on reconfigurable computing.

[0028] The processing unit 1 includes an embedded Flash memory device 4 for non-volatile storage of code, data and bit-streams and a further S-RAM based embedded FPGA unit 3 realized for the configuration purposes of the present invention.

[0029] More specifically, a 8Mb application-specific embedded flash memory device 4 is disclosed. The memory device 4 is integrated into a single chip together with a microprocessor 2 and the FPGA structure 3.

[0030] Advantageously, application-specific hard-ware units are added and dynamically modified by the embedded FPGA 3 reconfiguration. By implementing application-specific vector processing instructions the processing unit 1 shows a peak computing power of 1GOPS.

[0031] Efficient read-write-erase access to code, data and FPGA bitstreams is provided by the Flash memory device 4 based on a modular 8Mb, 4-bank Flash memory, as will be more clearly explained hereinafter.

[0032] The processing unit 1 comprises three content-specific I/O ports and delivers an aggregate peak read throughput of 1.2GB/s.

35 [0033] The system architecture 1 is illustrated in Figure 1.

[0034] The functional purposes of the embedded FP-GA 3 are:

- i) extension of the processor datapath supporting a set of additional special-purpose C-callable microprocessor instructions;
- ii) bus-mapped coprocessors, connected to the system bus through a master/ slave interface;
- iii) flexible I/O to connect external units or sensors with application-specific communication protocols.

[0035] Even though such different circuit purposes would require different kinds of programmable logic for best implementation of either arithmetic-dominated or control-dominated logic, a single programmable logic subsystem 3 has been implemented to be shared among different purposes both in space (same configuration) and time (subsequent configurations).

[0036] The single, high I/O count, fine-grain e-FPGA 3 operates as a datapath for the microprocessor pipeline

and as dedicated control logic for bus coprocessor and I/O control interface. The FPGA has a specific programming interface 7 connected to a port FP of said Flash memory device 4 through a DMA channel 8.

[0037] FPGA reconfiguration is concurrent to software execution.

[0038] A local bus 6 connects a dedicated 32-bit Flash memory port FP to the FPGA programming interface 7. [0039] A DMA channel 8 handles the bitstream transfer while microprocessor fetches instructions and data from different Flash memory ports: 64-bit wide code port (CP) and data port (DP).

**[0040]** To support streaming applications a 1kB dual-port buffer 9 is used to interface fast decoding hardware and slower software running on the processor 2.

[0041] The memory sub-system architecture is shown in Figure 2.

[0042] The modular structure of the memory (dotted line) includes:

- charge pumps 10 (Power Block);
- testability circuits 11 (DFT);
- a power management arbiter 12 (PMA); and,
- a customizable array 13 of N independent 2Mb flash memory modules 16.

[0043] Depending on the storage requirements the number N may be chosen; N=4 in the current implementation.

[0044] The modular memory features (N+2) 128-bit target ports and implements a N-bank uniform memory

[0045] As previously mentioned, three content-specific ports are dedicated to code (CP, 64-bit wide), data (DP, 64-bit) and FPGA bit stream configurations (FP, 32-bit). A 128 bit sub-system crossbar 15 connects all the architecture blocks and the eight bit microprocessor 2.

[0046] The main features of such the flash memory device 4 are: charge pump 10 sharing among different flash memory modules 16 through the PMA arbiter 12 in a multi-bank fashion. Moreover, the use of a small eight bit micro processor 2 to easy memory system test and to add complex functionalities for data management, and the use of an ADC (Analog-to-Digital Converter), required by the application, to increase system self test capability.

[0047] The third FP port of the Flash device 4 is dedicated to manage embedded-FPGA (e-FPGA) configurations data stored in flash memory modules. The FP port is read-only and provides fast sequential access for bit streams downloading.

[0048] The FP has four configuration registers replicating the information stored in CP port that must be used in order to write e-FPGA configurations data.

**[0049]** The output data word bus and the address bus are 32 bits wide. The FP port uses a chip select to access in the addressable memory space, and a burst enable to allow burst serial access.

[0050] In read operation, an output ready signal is tied low when data are not immediately available, so that it can acts as a wait state signal.

[0051] The eight-bit microprocessor 2 (uP) performs additional complex functions (defragmentation, compression, virtual erase, etc.) not natively supported by the DP port, and assists for built-in self test of the memory system.

[0052] The (N+2)x4 128-bit crossbar 15 connects the modular memory with the four initiators (CP, DP, FP and uP) providing that at least three flash memory modules 16 can be read in parallel at full speed.

[0053] The memory space of the four modules 16 is arranged in three programmable user-defined partitions, each one devoted to a port. The memory system clock can run up to 100MHz, and reading three modules 16 with 128bit data bus and 40ns access time, results in a peak read throughput of 1.2GB/s.

[0054] Each 2Mb flash memory module 16 has a 128-bit IO data bus with 40ns access time, resulting in 400Mbyte/s, and a program/erase control unit. Simultaneous memory operations use the power management arbiter 12 (PMA) for optimal scheduling.

[0055] Available power and user-defined priorities are considered to schedule conflicting resource requests in a single clock cycle.

[0056] The memory device 4 allows up to four simultaneous operations, with a limit of one both for write and erase.

[0057] Figure 3 depicts the memory hierarchy and parallelism across the processing unit 1. The ports CP and DP are interfaced to the 64-bit, 800MB/s AHB system bus 6.

[0058] At a system clock rate of 100MHz each I/O port can independently operate at maximum speed. So, an aggregate peak read rate of 1.2GB/s can be sustained as it is limited by memory access time.

[0059] In the current implementation the e-FPGA reconfiguration takes 500µs at 100 MHz. 50MB/s average throughput out of the available 400MB/s are currently sustained by the e-FPGA configuration interface 7

[0060] System performance is being evaluated for an image processing application (facial recognition) and a speech recognition application.

[0061] More than 20 specific instructions were designed as C/assembly-callable functions, automatically translated to RTL, then synthesized and mapped to the e-FPGA.

[0062] Figures 4 and 5 show two examples of specific microprocessor extensions.

[0063] Figure 4 relates to an eight-issue, eight-bit, L2 calculation accounts for 23 eight-bit arithmetic operations and six 64-bit operations requiring about 10k ASIC

20

equivalent gates.

[0064] Figures 5 relates to a datapath for an optimized fixed-point calculation of the square root accounts for twelve 32-bit operations for about 2k ASIC equivalent gates.

[0065] The overall performance improvements for the face recognition tasks are shown in the table of Figure 6. [0066] Execution time is compared for 32-bit RISC with basic DSP extensions (MAC, zero-overhead loops, etc) and the same processor enhanced with application-specific instructions.

[0067] Measured speed-ups range from 1.8x to 10.6x (on the most-demanding task), with an overall improvement of 8.5x. It must be noticed that switching between algorithm stages requires only one reconfiguration of the e-FPGA. Reconfiguration time is negligible.

[0068] The speed-up factors take into account the possible multi-cycle clock penalty due to processor-FP-GA synchronization in case of instruction extensions slower than the processor clock. Energy efficiency figures are reported in Figure 6 too.

[0069] As the average power consumption of the system extended with the e-FPGA is slightly higher (10-15%), the energy reduction for executing each of the tasks on its specific HW configuration (power-delay product improvement) results in an overall reduction of 6.7x.

[0070] Only one task showed slightly worse total execution energy, though showing benefits on execution speed.

[0071] Last column of Figure 6 reports the energy-delay improvement of each specific HW configuration compared to the general-purpose counterpart. Energy required for e-FPGA reconfiguration is always negligi-

[0072] Measurements show the best energy efficiency in the range of several MOPS/mW at 1.8V supply. It lies between conventional ASIP/DSP and dedicated configurable hardware implementations.

[0073] The full-processing unit on a single chip is implemented in a  $0.18\mu m$ , 2PL-6ML CMOS embedded Flash technology, chip area is  $70mm^2$ , technology and device characteristics are summarized in Figure 6 while a chip micrograph is shown in Figure 7.

### Claims

A dynamically reconfigurable processing unit (1) including an embedded Flash memory device (3) for non-volatile storage of code, data and bit-streams, the unit (1) being integrated into a single chip together with a microprocessor (2) core, further comprising an S-RAM based embedded FPGA unit structured for FPGA reconfigurations having a specific programming interface (7) connected to a port (FA) of said Flash memory device (4) through a DMA channel (8).

- A dynamically reconfigurable processing unit according to claim 1, wherein said DMA channel (8) handles the bitstream transfer while said microprocessor (2) fetches instructions and data from different Flash memory ports of said Flash memory device (4); a wide code port (CP) and a data port (DP).
- 3. A dynamically reconfigurable processing unit according to claim 2, wherein said Flash memory device (4) includes a modular array structure (13) comprising N memory blocks (16), and wherein a power block (10), including charge pumps, is shared among different flash memory modules (16) through a PMA arbiter (12) in a multi-bank fashion.
- A dynamically reconfigurable processing unit according to claim 1, wherein said embedded FPGA unit (3) exploits the following functions:
  - iv) extension of the processor datapath supporting a set of additional special-purpose Ccallable microprocessor instructions:
  - v) bus-mapped coprocessors, connected to the system bus through a master/ slave interface;
  - vi) flexible I/O to connect external units or sensors with application-specific communication protocols.
- 5. A dynamically reconfigurable processing unit according to claim 2, wherein said Flash memory device (4) includes at least three different access ports, each for a specific function:
  - said code port (CP) optimized for random access time and the application system;
  - said data port (DP) allowing an easy way to access and modify application data; and,
  - said FPGA port (FP) offering a serial access for a fast download of bit streams for an embedded FPGA (e-FPGA) configurations.
- 6. A dynamically reconfigurable processing unit according to claim 2, wherein said third port (FP) comprises four configuration registers replicating the information stored in said code port (CP) that must be used in order to write e-FPGA configurations data.
- A dynamically reconfigurable processing unit according to claim 5, wherein said third port (FP) uses a chip select to access in the addressable memory space and a burst enable to allow burst serial access.
- 8. A dynamically reconfigurable processing unit ac-

40

45

*35* '

cording to claim 1, wherein said connection between said interface (7) and said port (FA) is provided by a local bus (6).

9. A dynamically reconfigurable processing unit according to claim 5, wherein said Flash memory device (4) includes four modules (16) each arranged in at least three programmable user-defined partitions, each one devoted to a corresponding port.







FIG. 3



| Aigorthms.<br>Stage                           | RISC<br>With basic DSP<br>Vinstruction<br>Support | RISC with Wilcroprocessor Extensions | Speedewood | Energy-<br>Gain | Energy Efficiency<br>Gain<br>(Energy-x Delay) |
|-----------------------------------------------|---------------------------------------------------|--------------------------------------|------------|-----------------|-----------------------------------------------|
| Bayer filter                                  | 58 msec                                           | 24.7 msec                            | × 2.3      | × 1.4           | x 3.22                                        |
| Edge detection                                | 4.5 msec                                          | 2.5 msec                             | × 1.8      | × 0.95          | x 1.71                                        |
| Face detection                                | 1.5 sec                                           | 382 msec                             | × 4        | x 2.9           | x 11.6                                        |
| Face recognition<br>(Twenty face<br>database) | 9,15 sec                                          | 860 msec                             | x 10.6     | 6<br>×          | × 95.4                                        |
| Totals                                        | 10.7 sec                                          | 1.26 sec                             | x 8.5      | × 6.7           |                                               |

FIG 6



FIG. 7



## **EUROPEAN SEARCH REPORT**

EP 03 42 5054

|                                |                                                                                                                                                                           | ERED TO BE RELEVANT                                                                                                        | Relevant                                                            | CLASSIFICATION            | OF THE     |
|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|---------------------------|------------|
| Category                       | of relevant pass                                                                                                                                                          |                                                                                                                            | to daim                                                             | APPLICATION (I            |            |
| X                              | AL) 15 October 2002<br>* figures 1,2 *                                                                                                                                    | 6 - column 35, line 15                                                                                                     | 1-9                                                                 | G06F15/78<br>G11C16/30    |            |
| A                              | future mobile telec<br>constraints"<br>PARALLEL AND DISTRI<br>SYMPOSIUM., PROCEED                                                                                         | itecture dealing with ommunications BUTED PROCESSING INGS INTERNATIONAL,                                                   | 1,2                                                                 |                           |            |
|                                | ALAMITOS, CA, USA,I<br>15 April 2002 (200<br>156-163, XP01059120<br>ISBN: 0-7695-1573-8                                                                                   | 15-19 APRIL 2002, LOS<br>EEE COMPUT. SOC, US,<br>2-04-15), pages<br>6                                                      |                                                                     | TECHNICAL FIE             | I DE       |
|                                | 22, rigure 5 +                                                                                                                                                            |                                                                                                                            |                                                                     | SEARCHED                  | (int.Cl.7) |
| A                              | US 5 693 570 A (LEE<br>2 December 1997 (19<br>* figure 2.A *                                                                                                              |                                                                                                                            | 3                                                                   | G06F<br>G11C              |            |
| A                              | "XC6200 Field Prog<br>Product Description<br>24-APRIL-1997 XP002<br>* page 31 *                                                                                           | rammable Gate Arrays<br>", XILINX ,<br>262656                                                                              | 7                                                                   |                           |            |
|                                |                                                                                                                                                                           |                                                                                                                            |                                                                     |                           |            |
| •                              |                                                                                                                                                                           |                                                                                                                            |                                                                     |                           |            |
|                                |                                                                                                                                                                           |                                                                                                                            |                                                                     |                           |            |
|                                |                                                                                                                                                                           |                                                                                                                            |                                                                     |                           |            |
| •                              | The present search report has                                                                                                                                             | been drawn up for all claims                                                                                               | 1                                                                   | !                         |            |
| <u> </u>                       | Place of search                                                                                                                                                           | Date of completion of the search                                                                                           | 1                                                                   | Examiner                  |            |
|                                | THE HAGUE                                                                                                                                                                 | 26 November 2003                                                                                                           | Bos                                                                 | ch Vivancos               | , P        |
| X:par<br>Y:par<br>doo<br>A:ted | CATEGORY OF CITED DOCUMENTS ticularly relevant if taken alone ficularly relevant if combined with another to the same category hnological background n-written disclosure | T : theory or principl E : earlier patent do<br>after the filing dat<br>her D : document cited i<br>L : document cited for | e underlying the<br>current, but publice<br>te<br>n the application | Invention<br>ished on, or |            |

## ANNEX TO THE EUROPEAN SEARCH REPORT ON EUROPEAN PATENT APPLICATION NO.

EP 03 42 5054

This annex lists the patent family members relating to the patent documents cited in the above-mentioned European search report. The members are as contained in the European Patent Office EDP file on The European Patent Office is in no way liable for these particulars which are merely given for the purpose of Information.

26-11-2003

|    | Patent documer<br>cited in search rep |    | Publication date |                            | Patent family member(s)                                       | Publication date                                                   |
|----|---------------------------------------|----|------------------|----------------------------|---------------------------------------------------------------|--------------------------------------------------------------------|
| US | 6467009                               | 81 | 15-10-2002       | AU<br>GB<br>WO             | 6431999 A<br>2361559 A ,B<br>0022546 A2                       | 01-05-2000<br>24-10-2001<br>20-04-2000                             |
| US | 5693570                               | A  | 02-12-1997       | US<br>US<br>US<br>US<br>US | 5621685 A<br>5508971 A<br>5563825 A<br>5568424 A<br>5592420 A | 15-04-1997<br>16-04-1996<br>08-10-1996<br>22-10-1996<br>07-01-1997 |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            |                                                               | •                                                                  |
|    |                                       |    |                  |                            |                                                               | ·                                                                  |
|    |                                       |    |                  |                            |                                                               |                                                                    |
|    |                                       |    |                  |                            | •                                                             |                                                                    |

For more details about this annex : see Official Journal of the European Patent Office, No. 12/82



① Publication number: 0 668 659 A2

(12)

# **EUROPEAN PATENT APPLICATION**

(21) Application number: 95301002.2

(51) Int. Cl. 6: H03K 19/177

(22) Date of filing: 16.02.95

(30) Priority: 17.02.94 GB 9403030

(43) Date of publication of application: 23.08.95 Bulletin 95/34

Designated Contracting States: AT BE CH DE DK ES FR GB GR IE IT LI LU MC NL PT SE

(1) Applicant: Pilkington Germany (no. 2) Limited Prescot Road
St. Helens, Merseyside WA10 3TT (GB)

(72) Inventor: Austin, Kenneth Brockhurst Hall, Brockhurst Way Northwich, Cheshire, CW9 8AL (GB)

(74) Representative: Cardwell, Stuart Martin et al Roystons
Tower Building
Water Street
Liverpool, Merseyside L3 1BA (GB)

## (54) Reconfigurable ASIC.

A configurable semi-conductor integrated circuit, with particular application as a re-configurable application specific device. In order to be able to rapidly switch between two or more, preferably several, configurations, the invention provides a configurable semi-conductor integrated circuit in which an area (1) thereof is formed with a plurality of cells (2) each having at least one function and interconnections with at least some other said cells (2). At least some of the plurality of cells have interconnections (25) which are electrically selectable as to their conduction state, and at least some of the plurality cells have interconnections (YA-YD) which are pre-wired. Each cell has two or more possible configurations, each configuration being defined by the cell function and/or its interconnection with other cells according to cell configuration data, and further comprising means (36, 38, 40) storing configuration data for at least two cell configurations (per cell) and means (30, 32, 34, 42, 48) to enable one of the possible cell configurations according to the cell configuration data selected.

20

25

30

40

45

50

The present inventin relates to a configurable integrated circuit, with particular emphasis on a re-configurable application specific device but without lim-

1

itation to same.

Micro-processors are designed into many applications because of their low cost and high performance. However, for many applications such as image compression and digital signal processing they are too slow. Modifications to the basic micro-processor architecture has led to several new devices, digital signal processors (DSP), reduced instruction set computers (RISC) and custom processors (CP). Each

f these devices are optimised to perform a restricted number of tasks but at very high speed. Many applications require several types of such devices to achieve the necessary level of performance. This is because of the requirement to perform different types of computational tasks over a period of time or the limited capability of each device. Essentially these devices are used as low cost high performance numerical engines, each optimised to implement a general class of algorithms. However, a designer frequently requires a different architecture to efficiently implement a new algorithm and the usual practice in such circumstances is to design a custom processor for this task. This leads to long and expensive design cycles and does not allow the designer any flexibility to change the algorithm.

Field programmable gate arrays (FPGAs) are commonly used to replace standard products and they could be used as a numerical engine. However, they are general purpose devices that cannot efficiently implement high speed circuits. In order to achieve the level of complexity that is normally required, several FPGAs would be necessary which would increase the cost of the final system. Some FPGAs are configured using on chip static random access memory (SRAM) and these devices can be re-programmed to perform different tasks which could lead to greater flexibility and higher levels of performance. However, these devices are connected to an external source of configuration data that is accessed by the device to configure internal resources. The time to configure or re-configure the FPGA can be several milliseconds, due to the necessity to import configuration data from an external source, and this time is several orders of magnitude too slow. Reconfiguration speeds of less than 100 nano-seconds are required for high performance applications. As such FPGA's cannot be reconfigured fast enough to make them suitable for use as a high performance numerical engine. In FPGAs a considerable amount of silicon area is committed to the configuration memory which is required to program interconnect resources. Whilst in theory FPGA's could accommodat an additi nal configuration by increasing the amount of on chip memory which is available to hold configuration data this would probably increas the size of the chip by 60

per cent which would be prohibitive for high density arrays.

The aim of the invention is to provide a re-configurable architecture which can rapidly switch between two or more, preferably several, configurations. Another aim of this invention is to provid a d vice that is specifically optimised to carry out functions for numerically intensive applications. Another aim is to provide a device that prior to the application of power contains one or more boot up primary configurations, suitable for configuring the device into the intended application. A further aim is to produce a device that has provision for passing data between successive configurations of the (base) device. A still further aim is to ensure that during configuration of the device, data is held in a safe condition and that switching currents are minimised. A still further aim is to provide a configuration cache that will allow updating of configuration memories that are not currently in use. Another aim of the invention is to allow the device to select its own configuration from an external source of configuration data.

Another aim is to reduce the number of programmable interconnections by pre-wiring a portion of the logic into the required configuration.

A yet further aim is to increase performance of the device by pre-arranging specified primary functions to specific areas of the device such primary functions being substantially pre-wired.

Accordingly one aspect of the invention provides a configurable semi-conductor integrated circuit in which an area thereof is formed with a plurality of cells each having at least one function and interconnections with at least some other said cells, at least some of the plurality of cells having interconnections which are electrically selectable as to their conduction state, and at least some of the plurality of cells having interconnections which are pre-wired, each cell has two or more possible configurations, each configuration being defined by the cell function and/or its interconnection with other cells according to cell configuration data, and further comprising means storing configuration data for at least two cell configurations (per cell) and means to enable one of the possible cell configurations according to the cell configuration data se-

By pre-wired in relation to interconnect we mean uninterruptable as to its conduction state. The configuration data controls selection of the cell function and/or cell interconnections preferably using decoders or alternatively controlled directly from memory. Thus for example the cells configuration data determines the routing of the signal through the cell. Direct connection paths exist between the configuration stores, the d coders and th selectable functions and interconnections. The term function as used herein may be a logic function, arithmetic function, or interconnect function. A cell may have one or more of

25

40

45

50

thes functions r a combination of two or more of these. Preferably the configuration data stores are disposed in the cell. The distriction data stores are lected using an instruction bus receiving signals from a sequencer and controller. One or more of the configurations may be pre-wired (i. not programmable). Advantageously one or more of the configuration data stores are programmable using a data transfer bus. Where more than one store is programmable an instruction update bus is provided to write enable the required configuration store. Configuration stores not currently accessed to control interconnection and/or cell function can be updated using the instruction update bus.

Since the present invention is particulary concerned with an application specific device which is optimised to perform a restricted number of tasks at high speed but which is quickly reconfigurable during program execution (when required) to perform some other specific task, cells are optimised for a primary function according to a primary configuration. Advantageously the primary configuration data is pre-wired. It is convenient to have two alternate pre-wired primary configurations. Cells can be and most usually will be optimised for different primary functions. Advantageously the pre-wired interconnections are used in connection with the optimised functions.

A possible primary function is that of an adder. Another aspect of the invention provides a multi-bit adder for summing at least two multi-bit words comprising a first multi-bit adder block for summing the least significant bits and at least one further multi-bit adder block for summing the most significant bits and having sum selection means wherein said further multi-bit adder block calculates the two possible sums resulting from a carry out from the previous block being equal to '0' and '1' respectively and wherein the sum selection means selects the sum of the further multi-bit adder block according to the carry out calculated from the previous block.

In the case of a Digital Signal Processor application, some cells will be optimised as Arithmetic Logic Units (ALU) while other cells may be optimised to carry out functions such as instruction decode or as processor registers. The number of different cells is nly limited by the size of the array of cells. In practice the array will be divided into a number of discrete areas that are particularly efficient at implementing respective primary functions. It will be apparent that each of these cells has the capability to implement another function and usually a range of other functions according to other configurations. These additional functions are controlled by the controller and sequencer whose role is to ensure that the correct functi n is available when requir d. Primary functions may use general interconnect resources, but pref rably thy have their own dedicated resource for high speed connections between primary functions

of other cells. In this way the performance of the d-vice is not dependent on a general programmable interconnect resource and by connecting primary functions through resources with smaller parasitic loads, the device can operate faster.

In order to safeguard data when changing between configurations each cell has a latch controlled by a function control bit. Transient current is reduced when switching between configurations by the provision of a buffer in each cell, the buffer being controllable as to is state during reconfiguration by a control line.

It will be apparent that whilst this device has specific application in the field of numerical engines such as DSP's, the primary functions can be chosen to suit other applications. Accordingly the techniques can be applied to any application. For example, another application is as a programmable communications device

Another aspect of the invention also provides a method of configuring a configurable semi-conductor integrated circuit in which a sequence is programmed with data to facilitate selection of a required configuration from at least two possibilities. Usually each of a plurality of cells will have at least two configuration possibilities. Advantageously the configurations are programmable and the method further comprises inputting and storing configuration data. A further advantageous feature is the ability to program the sequences to write over previously stored configuration data at a prescribed point in operation of the circuit. An aspect of the invention provides a semi-conductor integrated circuit in which the circuit configuration is changed according to a pre-programmed sequence of configuration during operation of the device.

The present invention will now be described by way of example only with reference to the accompanying drawings; in which:-

Figure 1 is a schematic layout for re-configurable application specific device embodying the invention:

Figures 2 and 3 illustrate diagrammatically the feature of the core architecture having different configurations and sequential access;

Figure 4 illustrates diagrammatically the feature of the core having cells which are optimised to implement specific functions;

Figure 5 illustrates diagrammatically a primary configuration for the device as a Digital Signal Processor (DSP);

Figure 6 Illustrates diagrammatically a secondary configuration for the device as a large multiplier;

Figure 7 illustrates sch matically the layout of a cell including configuration memory means;

Figure 8a illustrates diagrammatically the possibl arrangem nt of the cells in blocks with optimis d functi ns;

20

25

30

35

Figure 8b illustrates schematically programmable local and global interconnect resources for th cells;

Figures 9a and 9b illustrate diagrammatically how the global interconnect resources are connect dt the cell input and output multipl x rs; Figures 9c and 9d illustrate diagrammatically an array of cell blocks and the arrangement of cells within a cell block;

Figure 10 illustrates diagrammatically cell output state control;

Figures 11, 12 and 13 illustrate diagrammatically three logic cell variants namely an Arithmetic Logic Unit function (ALU), an Accumulator function (ACC), and a Decode cell function respectively:

Figure 14 illustrates diagrammatically examples of different functions from the ALU and ACC optimised core cells;

Figure 15 illustrates diagrammatically details of configurable Static Random Access Memory provisions;

Figure 16 illustrates diagrammatically further details of the cell configuration memory;

Figure 17 illustrates diagrammatically instruction bus connections for DSP cells;

Figure 18 illustrates diagrammatically a novel parallel carry select adder architecture which can be configured by the device;

Figure 19 illustrates a cell configured to implement a single stage carry select adder,

Figure 20 illustrates a cell configured to implement two carry select adders;

Figure 21 illustrates an alternative cell configuration to implement a single stage carry select adder, and

Figure 22 illustrates diagrammatically a DSP Timing Diagram.

The present invention is described in the context of an integrated circuit intended for an application specific device and will be described by way of example in the specific context of a Digital Signal Processor (DSP). According to the invention the device is not restricted to a fixed architecture, but has the hardware re-configurable to allow the device (eg. DSP) to be optimised for each individual task. Thus at a macro level the device may be optimised for a new application for example MPEG, Polygon Engine, Blitter, DMA Engine, whilst at a micro level, the device can be optimised for each OPCODE, eg. MULTIPLE ALU, CUS-TOM MULTIPLY. Thus a re-configurable application specific device (eg. DSP) allows many custom devices to be replaced with a single chip. Optimised OP-CODES increase performance. In effect the device can switch at clock speed between operating as a DSP, RISC or custom processor.

Referring firstly to Figure 1, here there is illustrated a re-configurable application specific digital signal

processor. The chip includ s an area 1 f core cells, Partitioned Static Random Access Memory (SRAM), 3, a sequencer and controller 5 having control lin s 7, clocks 9 and clock lin s 11, as well as programmable input/output 13 and associated data bus 15. Also shown is a signal Decompress d coder 17, a communications link 19 and associated input/output and Expansion porting 21, and address bus 23.

There are a plurality of core cells 2 and these provide for example (in the case of a DSP configuration), Instruction Decode, registers, programme counter and stack pointer facilities. Each core cell can be programmed to perform a range of functions and certain core cells are optimised to implement specific functions. Thus, for example, reference to Figure 4 illustrates optimisation of certain cells for ALU functions as at 2a, registers 2b, programme counter 2c, general counter 2d, instruction decode 2e and input/output 2f.

One schematic configuration of core cell denoted by dotted outline is shown in Figure 7 and the core cell includes within it a logic cell 22 having selectable functions (for example four). Programmable core cell inputs (eight) (ie. electrically selectable interconnections) are shown at 25 applied to two 4:1 input multiplexers 26,28. The cell output is shown at 27. Examples of Logic cell configurations are described further with reference to Figures 11, 12,13, and 14. Input multiplexers are controlled by respective 2-4 Decoders 30, 32. A further 2-4 Decoder 34, controls a 4-1 Multiplexer in the logic cell 22 and an output multiplexer 70 is controlled by a 2-4 Decoder 48. Direct pre-wired connections to the logic cell are indicated by numeral YA-YD.

In the Figure 7 illustration the cell includes configurable memory provisions comprising configuration cache 36 and instruction cache 38, as well as so called "hard wired" or fixed configuration provisions 40. For the DSP application the fixed configurations comprise a primary DSP Boot Configuration set by 3 x 2 bit configuration elements 40a, and a secondary configuration eg. Multiplier configuration set by 3 x 2 bit configuration elements 40b. It is intended that the primary (fixed) configuration will be implemented automatically on boot-up of the device so as to give it its primary application specific function.

The configuration cache 36 in the illustrated embodiment comprises four, 3 x 2 bit data stores, 36a-d which can be write enabled from an instruction update bus 44 and written with data from Data bus 46. The instruction cache 38 comprises 8 x 2 bit data stores which are write enabled from the Instruction update bus (44) and written with data from the data bus 46. The instruction cache 38 is read enabled from the Instruction select bus 42. A 2-4 Decoder 48 enabled from the instruction select bus 42 selects and read enables one of the four data stores 36a-d according to the data store of the instruction cache selected. The utput of Decoder 48 also facilitates the

55

30

40

45

direct configuration of the logic cell by controlling the 4:1 output multiplexer 70. Also illustrated is a function control bit 50 and has connections from the read and write nable lines (42,44) and into the logic cell 22. The function control bit 50 controls latch 54 (see Figure 10).

Figure 16 illustrates, for the fixed configuration provisions (40) and the configuration cache 36, the read (42), write 44' and data 46' connections. Note both read and write provisions for the configuration cache 36 only.

Reverting back to Figures 2 and 3, each of blocks 2', 2" and 2"' represent configurations of the core 2. Large blocks of functionality are accessed as a series of configurations. Each new configuration receives data from the last using inter-process connections 52 and cells 54 designated for latching critical data. Other cells 54 are designated to act as inputs or outputs. Reconfiguration time can be of the order of 10nsec. The core architecture is optimised to implement each OPCODE. This allows the word size of ach arithmetic function to be adjusted to the required provision. Thus, referring to Figure 3, a first core configuration (OPCODE 1) executes a 16 bit multiply and cos function, a second core configuration (OPCODE 2) carries out a 32 x 32 bit multiply function, and a third configuration (OPCODE 3) carries out a 64 bit ADD function.

Reference is now made to Figure 10 which illustrates the output state control as applicable to the like of the cell illustrated in Figure 7 and the corresponding cell components appropriately referenced are illustrated with the exception of the instruction cache 38.

As has been mentioned above certain cells are designated for latching critical data and hence the cells have a latch provision 54 with inputs from the function control bit 50 and a hold input line 56. These function to preserve the state of data from cells between configurations. In addition a buffer 60 is provided in order to reduce transient current when switching between configurations by setting its output state to a known condition.

The cells interconnect resources are now described with reference to Figures 8a, 8b, 9a, and 9b. Figures 8a and 8b show diagrammatically how cells might be arranged in regular blocks (B) (eg. rows and columns), with the blocks including cells which are optimised for different functions. Thus Figure 8b shows columns of ACC cells, ALU cells and shift cells, and two rows of Decode cells. Columns of cells each have two global (Y) buses (Y1, Y2, Y3, Y4.....YN1, YN) and the rows of cells each have at least two global (X) buses (X1, X2.....Xn-1, Xn). The D cod cells head up the columns of each block and have three X buses. Bus switches BS are provided in the Y buses between adjacent blocks. In addition there are hidden (or pre-wired direct connecti n) Y buses, YA-YD. These run from the decode cells to all the cells in the

column below. In addition local direct connection paths are preferred between cells. Thus, taking as an example cell SC in Figure 8b, it has input connections from outputs of an upper adjacent cell, a lower adjacent cell, a right adjacent cell, a left adjacent cell, and an xt left adjacent cell. This connections are dispitated U, D, R, L, J. Not all cell variations will necessarily have all the local connections. The majority of these local connections are electrically selectable as to their conduction state, but most usually the left adjacent connection will be a pre-wired connection.

Figure 9a illustrates, for one cell as for all core cells, how an input multiplexer 26 controls selection of inputs from X and Y buses and an output multiplexer 70 controls selection of outputs to the same X buses and next column of Y buses.

The cells are arranged in 10 x 8 blocks and an example of such an array of cell blocks is illustrated in Figure 9c. Blocks 100 are formed in an 8 x 4 array and a programmable input/output 102, data buses and switches 104 and partitioned SRAM 106 are also shown. Each block 100 comprises an array of 10 x 8 cells and conveniently, columns of cells within the block have a similar primary configuration. For example, Figure 9d illustrates a block 100 having two columns of cells 100 a & b configured as multiplexer cells, columns 100 c as a product adder, 100d barrel shifter cells, 100 e arthimetic and logic cells, 100 f accumulator cells and columns 100 g & h configured as multiplier expansion cells. The columns in each block are headed up by decode cells.

Referring now to Figure 15, the configurable static random access memory (SRAM) 3 stores partition data passed to it from the sequencer and controller 5 along partition data bus 72. The operation of the DSP requires the storing and retrieving of data and the provision of the SRAM on the device ensures that access to the stored data is faster than if the SRAM was located externally.

The sequencer and controller 5 controls the operation of buses 42, 44, 45 and 46. Hence, the sequencer and controller 5 includes the control of the operation of selecting individual data stores of cells, sending data to the stores and controlling the sequence of implementation of configuration data stored within cell. The necessary control instructions for the sequencer and controller 5 is provided by an external source of memory (not shown). In addition to the above operations, the controller 5 can select individual data stores not currently used such that they can be updated with new configurations from the external memory.

Figur s 11, 12 and 13 illustrate respective ALU, ACC and Decode cell variants. Appropriate references have been used as previously ref rred t.

Figure 13 shows an example of a c II optimised for decode. Tw dec d c IIs will h ad up th blocks of cells as shown in Figures 8a and 8b. The illustrated

25

40

variati n is the on which has the pre-wired int rconnection YA, YB which f ed down to each of the cells below. The other decod will generat the YC, YD pre-wired interconnections. Thus th ALU type cells of Figure 11 have pre-wired connections YA, YB, whilst the ACC typ cells hav pre-wired connections YA, YB, YC, YD. Note also that for the ALU and ACC variants the left adjacent connection L is pre-wired, and for the ALU cell the Cin, Cout is a pre-wired interconnection running the length of the column of cells. Other X and Y buses are as described above.

Control signals from the outputs of the decode and for inputs of the cell variants will be pre-wired for the optimised cell functions, ie. for any functions which are known to be needed for the specific application.

Figure 14 illustrates some of the different functions which are available from the ACC and ALU core c lls of Figures 10 and 11 respectively.

Figure 17 illustrates an alternative internal cell arrangement for the case of DSP cells (shown simplifi d) with the cell input shown simply at 25 and cell output at 27. The memory comprises 8 x 3 bit data stores and a 3-8 Decoder 80 is provided such that one of the eight selectable options (eg. functions or interconnect) contained in the logic cell can be selected. In order to update a particular data store within a particular cell there is provided a memory select 45 (omitted from the illustrations of the previously described cell arrangement) and hence the required cell can be selected and the particular data store to be write enabled or read enabled is selected by the instruction update bus (44) or instruction bus (42). Data is written to the data store from memory data bus (46) (not illustrated in Figure 17).

A novel adder structure which can be configured by the device will now be described with reference to Figures 18 to 21. A 16-bit adder is illustrated in Figure 18 and indicated generally by numeral 60. The adder comprises a plurality of carry select adders 62 forming a first multi-bit adder block 64 and a second multi-bit adder block 66. The adder 60 sums two 16 bit words indicated as a1, a2, a3.....a16 and b1, b2, b3.....b16 in order to derive a sum indicated by s1, s2, s3.....s16 and carry element 'Cout'.

First multi-bit adder block 64 sums the eight least significant bits of each 16 bit word and for each bit there is an associated carry select adder 62. Each carry select adder comprises two inputs An, Bn (wherein 'n' is the number of the bit), output 68, carry in 70, carry out 72 and a first and second 2:1 multiplexer 74, 76. The first input to the first multiplexer 74 is equal to the value of An + Bn assuming the carry in is '0' and th second input assum s that carry in to be '1'. The output Sn is selected by the carry in 70.

The two inputs to the second multiplexer 76 are equal to the carry resulting from the sum of An and Bn with the carry in being equal to '0' and '1'. The car-

ry ut 72 is s 1 cted by carry in 70. Obviously, the carry in to the first carry select adder will be equal to '0'.

The second multi-bit adder block 66 sums th eight most significant bits of each 16 bit word and for each bit there are two associated carry select adders, 78, 80. Each of the carry select adders 78, 80 is constructed in a similar manner as described above. Carry select adders 78 sum the two eight bit words ie. a<sub>9</sub>, a<sub>1</sub>0.....a<sub>1</sub>6 and b<sub>9</sub>, b<sub>1</sub>0.....b<sub>1</sub>6, assuming that the carry out from the first adder block 64 is '1' and carry select adders 80 assume that the carry out is '0'. Therefore, for each bit two outputs are calculated and fed into an associated multiplexer 82. The output providing Sn is selected by the carry out from the first adder block 64.

In operation, the first adder block calculates the addition of the eight least significant bits and produces a carry out value. Simultaneously, the second adder block calculates the two possible sums of the addition of the most significant bits and the correct sum is selected by the carry out produced by adder block 64. In consequence the time delay to calculate a 16 bit addition is taken to be the delay in the addition of the first eight bits (8ADD) plus the delay in selecting the sum of the last eight bit ie. one multiplexer delay (MUX).

For each additional eight bit adder block the time delay is equal to one multiplexer. For example, a thirty two bit adder would result in a propagation delay of 8ADD + 3 X MUX. In consequence, the adder structure described results in an Improved speed of operation compared to that of a conventional adder structure.

Figure 20 illustrates an alternative cell structure wherein the two carry select adder requiring two cells can be replaced by a single configured cell.

Figure 21 illustrates a conventional circuit for single stage carry select adder which may be used as an alternative to the circuit of Figure 19.

The operation of the device will now be described wherein initially, as described above, the configuration provisions 40 are 'hard wired' or fixed with a DSP configuration 40a and a multipler configuration 40b.

An external memory store (not shown) contains all the necessary configuration data in order to control the controller and sequencer such that each of the data stores (36a-d, 38) in each cell can be programmed. In order to program a data store a typical procedure would be to firstly select the cell by memory select 45, select the data store to be write enabled by instruction update bus 44 and to write data to the selected store via data bus 46.

Each of the four data stores of the configuration cache 36 contains sufficient configuration data to select the input to the logic cell 22 and to also select one of the functions contained within the logic cell.

The initial boot up operation of the device results in a configuration as per eith r of the primary configurations 40a, 40b according to the boot up instruc-

55

15

20

25

.30

35

tion. Thus for example the DSP or Multiplier configuration is established.

However, if the device is required to implem nt another configuration eg. a divide function, then the controller and sequencer 5 selects and write enables the required data store of the configuration cache 36 of each cell necessary to implement the configuration. The external memory supplies the necessary data as to which cell and data stores are to be selected in order to implement the required configuration.

There is also the option for adopting other programmed configurations from the configuration cache and for writing and substituting other configurations.

Thus for the example given, the four configurations possible from the configuration cache may not be sufficient. Software programming can be used to implement another configuration. The programmer will be able to refer to the technical specifications for the device and determine how the desired function/configuration can be implemented (for example many possible architecture changes will be listed, perhaps in terms of a load instruction). Thus whilst load instructions 1-4 might represent the most typical configurations which are to be stored in the configuration cache, the programmer determines from the technical specification that load instruction 33 for example is required. Thus the programmer will have the instruction loaded into the configuration cache. There will be instances where more configurations are required to process the incoming data then can be stor-

d in the cell memory for access at clock speed. However, this difficulty can be overcome by re-programming a "redundant" configuration cache with the "additional" configuration data in advance of its requirement, by including the re-configuration instruction in the software programme. The sequencer can control re-configuration at clock speed, whilst the data from the configuration is held safe in the latch cells. The four configurations (36a-36d) of the cache can be reused in different combinations at different cell sites. This is facilitated by instruction cache (38) which can select different local cell configurations from a global instruction placed on instruction bus 42.

#### Claims

 A configurable semi-conductor integrated circuit in which an area (1) thereof is formed with a plurality of cells (2) each having at least one function and inter-connections with at least some other said cells (2), characterised in that at least som of the plurality of cells (2) have interconnections (25) which are electrically selectable as t their conduction state, and at least some of the plurality of cells (2) have interconn ctions (YA-YD) which ar pre-wired, each cell has two or mor possible configurations, each configuration being defined by the cell function and/or its interconnection with oth r cells according to cell configuration data, and further comprising means (36, 38, 40) storing configuration data for at least two cell configurations (per cell) and means (30, 32, 34, 42, 48) to enable one of the possible cell configurations according to the cell configuration data selected.

 A configurable semi-conductor integrated circuit as claimed in claim 1 in which means (36, 38, 40) storing at least two cell configurations are present in the cell.

 An integrated circuit as claimed in claim 1 or 2 in which the means for selecting the required cell configuration comprises an instruction bus (42) communicating with the said configuration data store.

4. An integrated circuit as claimed in claim 1, 2 or 3 in which at least one of the cell configurations is pre-wired (40a, 40b) to configure the integrated circuit with an application specific function when selected.

5. An integrated circuit as daimed in claim 4 in which there are two pre-wired (40a, 40b) application specific functions.

An integrated circuit as claimed in any one of the preceding claims in which there is at least one programmable cell configuration.

 An integrated circuit as claimed in any one of the preceding claims in which there are both pre-wired and programmable cell configurations.

40 8. An integrated circuit as claimed in claim 6 or 7 further comprising a write enable bus (44), and a data bus (46) communicating with the means (36, 38) storing the cell configuration data for the purpose of rewriting data to the store for re-programming purposes.

9. An integrated circuit as claimed in any one of the preceding claims further comprising means storing a plurality of configuration selection instructions, an instructions select bus (42) communicating with sald means and an output signal path for selecting the required configuration data store to b implemented or directly effecting cell configuration.

 An integrated circuit as claimed in claim 9 further comprising an instruction write bus (44) and an instruction data bus (46) for writing to the instruc-

55

10

15

20

25

30

40

45

50

tion storing means (36, 38).

- 11. An integrat d circuit as claimed in any one of the preceding claims in which means (54) is provided to preserve the output betwe n configurations.
- 12. An integrated circuity as claimed in claim 11 in which said means comprises a latch (54) wherein each cell incorporates a latch to preserve its output.
- 13. An integrated circuit as claimed in any one of the preceding claims in which the cells are optimised for a primary function.
- An integrated circuit as claimed in claim 13 comprising cells which are optimised for different primary functions.
- 15. An integrated circuit as claimed in any one of the preceding claims including means (60) to reduce transient current when switching between configurations.
- 16. An integrated circuit as claimed in claim 15 in which said means comprises a controllable buffer (60) in the output line of each cell.
- 17. An integrated circuit as claimed in any one of the preceding claims further comprising sequencer means (5) to control the availability and selection of the configuration.
- 18. An integrated circuit as claimed in any one of the preceding claims comprising decode means (30, 32, 34, 48) in each cell (2) to decode configuration state to control the configuration of each cell.
- 19. An integrated circuit as claimed in any one of claims 4 or 5, 13 or 14 in which the configuration data store corresponding to the primary or application specific function of the cell is contained within the device in a non-volatile memory.
- 20. An integrated circuit as claimed in any one of claims 13 or 14 in which the pre-wired (hidden) interconnect resources interconnect optimised cells for efficient implementation of the primary (application specific) functions.
- 21. A multi-bit adder for summing at least two multi-bit words comprising a first multi-bit adder block (64) for summing the least significant bits and at least one further multi-bit adder block (66) for summing the most significant bits and having sum selection means wherein said furth r multi-bit adder block calculates the two possible sums resulting from a carry out from the pr vious block

being equal to 'O' and '1' respectively and wherein the sum selection means selects the sum of the furth rmulti-bit adder block according to the carry out calculated from the previous block.

- 22. A method of configuring a configurable semiconductor integrated circuit having a plurality of cells (2) with at least two configuration possibilities in which a sequencer (5) is programmed with data to facilitate selection of the required cell configuration.
- 23. A method as claimed in claim 22 further comprising inputting and storing cell configuration data.
- 24. A method as claimed in claim 22 or 23 further comprising programming the sequencer with data to write over previously stored configuration data at a prescribed point in operation of the circuit.
- 25. A configurable semi-conductor integrated circuit characterised in that circuit configuration is changed according to a pre-programmed sequence of configurations during operation of the device.
- 26. An integrated circuit as claimed in claim 25 in which an area there is formed with a plurality of cells, each cell having two or more possible configurations, each configuration being defined by the cell function and for its interconnection with other cells according to configuration data.































